Partitioning Data on Features or Samples in Communication-Efficient Distributed Optimization?

نویسندگان

  • Chenxin Ma
  • Martin Takác
چکیده

In this paper we study the effect of the way that the data is partitioned in distributed optimization. The original DiSCO algorithm [Communication-Efficient Distributed Optimization of SelfConcordant Empirical Loss, Yuchen Zhang and Lin Xiao, 2015] partitions the input data based on samples. We describe how the original algorithm has to be modified to allow partitioning on features and show its efficiency both in theory and also in practice.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Scaling Queries over Big RDF Graphs with Semantic Hash Partitioning

Massive volumes of big RDF data are growing beyond the performance capacity of conventional RDF data management systems operating on a single node. Applications using large RDF data demand efficient data partitioning solutions for supporting RDF data access on a cluster of compute nodes. In this paper we present a novel semantic hash partitioning approach and implement a Semantic HAsh Partition...

متن کامل

Scaling Distributed Machine Learning with System and Algorithm Co-design

For a lot of important machine learning problems, due to the rapid growth of data and the ever increasing model complexity, which often manifests itself in the large number of model parameters, no single machine can solve them fast enough. Therefore, distributed optimization and inference is becoming more and more inevitable for solving large scale machine learning problems in both academia and...

متن کامل

ENERGY AWARE DISTRIBUTED PARTITIONING DETECTION AND CONNECTIVITY RESTORATION ALGORITHM IN WIRELESS SENSOR NETWORKS

 Mobile sensor networks rely heavily on inter-sensor connectivity for collection of data. Nodes in these networks monitor different regions of an area of interest and collectively present a global overview of some monitored activities or phenomena. A failure of a sensor leads to loss of connectivity and may cause partitioning of the network into disjoint segments. A number of approaches have be...

متن کامل

Communication Lower Bounds for Distributed Convex Optimization: Partition Data on Features

Recently, there has been an increasing interest in designing distributed convex optimization algorithms under the setting where the data matrix is partitioned on features. Algorithms under this setting sometimes have many advantages over those under the setting where data is partitioned on samples, especially when the number of features is huge. Therefore, it is important to understand the inhe...

متن کامل

PMJoin: Optimizing Distributed Multi-way Stream Joins by Stream Partitioning

In emerging data stream applications, data sources are typically distributed. Evaluating multi-join queries over streams from different sources may incur large communication cost. As queries run continuously, the precious bandwidths would be aggressively consumed without careful optimization of operator ordering and placement. In this paper, we focus on the optimization of continuous multi-join...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1510.06688  شماره 

صفحات  -

تاریخ انتشار 2015